Genetic Algorithm Based Probabilistic Motif Discovery in Unaligned Biological Sequences
نویسندگان
چکیده
Finding motif in biosequences is the most important primitive operation in computational biology. There are many computational requirements for a motif discovery algorithm such as computer memory space requirement and computational complexity. To overcome the complexity of motif discovery, we propose an alternative solution integrating genetic algorithm and Fuzzy Art machine learning approaches for eliminating multiple sequence alignment process. Problem statement: More than a hundred methods had been proposed for motif discovery in recent years, representing a large variation with respect to both algorithmic approaches as well as the underlying models of regulatory regions. The aim of this study was to develop an alternative solution for motif discovery, which benefits from both data mining and genetic algorithm, and which at the same time eliminates the cost caused by use of multiple sequence alignment. Approach: Genetic algorithm based probabilistic Motif discovery model was designed to solve the problem. The proposed algorithm was implemented using Matlab and also tested with large DNA sequence data sets and synthetic data sets. Results: Results obtained by the proposed model to find the motif in terms of speed and length are compared with the existing method. Our proposed method finds Length of 11 in 18 sec and length of 15 in 24 sec but the existing methods finds length of 11 in 34 sec. Compare to other techniques the proposed one was outperforms the popular existing method. Conclusion: In this study, we proposed a model to discover motif in large set of unaligned sequences in considerably minimum time. Length of motif was also long. The proposed algorithm will be implemented using Matlab and was tested with large DNA sequence data sets and synthetic data sets.
منابع مشابه
Genetic Algorithm Based Probabilistic Motif Discovery in Multiple Unaligned Biological Sequences
Many computational approaches have been introduced for the problem of motif identification in a set of biological sequences, which are classified according to the type of motifs discovered. In this study, we propose a model to discover motif in large set of unaligned sequences in considerably minimum time using genetic algorithm based probabilokistic Motif discovery model. The proposed algorith...
متن کاملDevelopment of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملA Combinatorial Approach for Motif Discovery in Unaligned DNA Sequences
Motif (conserved pattern) modelling and finding in unaligned DNA sequences is a fundamental problem in computational biology with important applications in understanding gene regulation. Biological approaches for this problem are tedious and time-consuming. Large amounts of genome sequence data and gene expression micro-array data let us solve this problem computationally. Most computer science...
متن کاملARCS-Motif: discovering correlated motifs from unaligned biological sequences
MOTIVATION The goal of motif discovery is to detect novel, unknown, and important signals from biology sequences. In most models, the importance of a motif is equal to the sum of the similarity of every single position. In 2006, Song et al. introduced Aggregated Related Column Score (ARCS) measure which includes correlation information to the evaluation of motif importance. The paper showed tha...
متن کاملIntroduction to Computational Biology Lecture # 10: Motif Discovery
In the previous lessons we learned about a probabilistic model which describes biological sequences Hidden Markov Model (HMM).We have also learnt how to evaluate the parameters of a specific HMMby an expectation-maximization algorithm (EM). In this lesson we will use these methods to solve the specific problem of motif discovery. Our goal is to find a word that appears in a non-conserved place ...
متن کامل